---
layout: post
title: Machine learning in COMET experiment (part II)
date: '2015-07-05T16:21:00.001-07:00'
author: Alex Rogozhnikov
tags:
- Machine Learning
- High Energy Physics
- Gradient Boosting
- tracking
- COMET
modified_time: '2015-07-04T17:00:15.185-07:00'
blogger_id: tag:blogger.com,1999:blog-307916792578626510.post-5396254184769962082
blogger_orig_url: http://brilliantlywrong.blogspot.com/2015/07/machine-learning-in-comet-experiment.html
---

In <a href="{% post_url 2015-06-22-machine-learning-used-in-tracking-of %}">previous post</a>
I've written a short explanation of COMET - a Japanese experiment in particle physics.
This second post will be devoted to machine learning approach we developed for
<i>tracking</i>.

<br />
<br />
<h3>Local wire features</h3>
<p>
    The data we should rely on when reconstructing tracks consists of:
</p>
<ul>
    <li>energy deposit for each wire</li>
    <li>time of deposit for each wire</li>
</ul>
<p>
    Let's have a look at their distributions:
</p>
<table class="image-wrapper">
    <tbody>
    <tr>
        <td><img border="0" height="269" src="/images/COMETdepositions.png" width="320"/>
        </td>
    </tr>
    <tr>
        <td>Energy deposited on each 'wire' by signal and background
            tracks
        </td>
    </tr>
    </tbody>
</table>
<table class="image-wrapper">
    <tbody>
    <tr>
        <td ><img border="0" height="248" src="/images/COMETtiming.png"
                                             style="margin-left: auto; margin-right: auto;" width="320"/></td>
    </tr>
    <tr>
        <td >Time elapsed starting from the moment of triggering till the
            time when particles are detected.
        </td>
    </tr>
    </tbody>
</table>

<p>
How do we obtain these values?
</p>

<p>
COMET uses <a href="https://en.wikipedia.org/wiki/Straw_chamber">straw chambers</a> (which we call <i>wires</i> for simplicity),
these are long tubes, which fill detector.
They are filled with gas, which is ionized by moving charged particles (mostly, electrons).
</p>

<p>
After ionization, electrons and ions are moving to opposite directions,
so we probably can estimate the moment of ionization by drift time
(which is of course much greater compared to time of particle flight in detector).
Drift time also gives approximate information about the distance between the center of wire and track.
</p>

<table class="image-wrapper">
    <tbody>
    <tr>
        <td ><img border="0" height="305" src="/images/StrawStructure.jpg"
                                             width="400"/></td>
    </tr>
    <tr>
        <td >Straw tubes used in many detector systems</td>
    </tr>
    </tbody>
</table>
<p>
    Finally, each wire has one more characteristic, namely, the distance to target:<br/>
</p>
<table class="image-wrapper">
    <tbody>
    <tr>
        <td><img border="0" height="262" src="/images/COMETlayer.jpg" width="320"/>
        </td>
    </tr>
    <tr>
        <td >Signal hits are more frequent on the inner layers by
            construction of experiment.
        </td>
    </tr>
    </tbody>
</table><br/><br/><br/>
<p>The time is counted from the moment of when trigger worked (it's another subdetector system). The event is 'recorded' starting from that moment.
    </p>

<p>The basic algorithm that was proposed (baseline algorithm) is using the cut on energy deposition. As you can see, there is really significant difference: energy deposited by background hits is higher. The reason is that background particles are moving slower, so they ionize more particles.
</p>
<p>
ROC AUC when we use the only feature - energy, is around 0.95, which seems to be very high. Nevertheless, it's not enough, since we have around 4400 wires, of which around 1000 gets activated (this number is called occupancy) within each measurement (event), while the signal track usually contains around 80 points.
</p>
<p>
    In other words, event is represented using 4400 pairs (energy, time), of which most are zeros.<br/>
</p>
<p>
    And the noise which passes through such basic filter is still very significant and there is a large room for improvements.
</p>

<p>
First, let's combine all the information we have about the single wire (distance from center, time and energy deposition), let's call them wire features:
</p>

<table class="image-wrapper">
    <tbody>
    <tr>
        <td><img border="0" height="285" src="/images/COMETlocalsimplerocs.jpg" width="320"/></td>
    </tr>
    <tr>
        <td>ROC curves (using physical notation here), we see that usage
            of wire features made us able to twice decrease the background efficiency.&nbsp;</td>
    </tr>
    </tbody>
</table>
<h3>Neighbors features</h3>
<p>
Let me remind you, how the whole picture of hits in COMET detector looks like (we use here projection on the plane, orthogonal to beam line):
</p>
<table class="image-wrapper">
    <tbody>
    <tr>
        <td><img border="0" height="318" src="/images/COMET2dprojection.png" width="320"/></td>
    </tr>
    <tr>
        <td class="tr-caption" style="text-align: center;"><br/>Blue dots designate background hits, most of which are
            tracks of different charged particles, say protons; but some of them are simply noise.<br/>Signal hits,
            which we are looking for, &nbsp;are red points forming an arc of circle. It's approximate radius is known
            ahead.
        </td>
    </tr>
    </tbody>
</table>

<p>
 Simple but useful observation one can see in&nbsp;this plot: signal hits nearly always come together. <br/>
 This implies that we&nbsp;can try using features collected from neighbours of&nbsp;point.<br/>
</p>
<p>
 This drives to&nbsp;dramatic improvement of&nbsp;classification: we&nbsp;almost get rid of&nbsp;random noise tracks, still there are some coupled misidentified tracks. AUC&nbsp;is about 0.995, fpr (background efficiency) decreased by&nbsp;<nobr>4-5 times.</nobr> In&nbsp;principle it&nbsp;suffices to&nbsp;use only information from left and right wires from the same layer.
</p>
<p>
    Ok, I&nbsp;believe that was simple. Is&nbsp;there still something we&nbsp;missed and that could be&nbsp;improved? <br/>
 For sure, this is&nbsp;the whole shape of&nbsp;the track. It’s time to&nbsp;try using this information.<br/>
</p>

<h3>Hough transform and circle shape</h3>
<p>
<a href="https://en.wikipedia.org/wiki/Hough_transform">Hough transform</a> was initially developed to&nbsp;detect lines and circles. Being quite trivial, it&nbsp;is&nbsp;one of&nbsp;the most effective algorithms in&nbsp;high energy physics. <br/>
</p>
<p>

 After using GBDT trained on&nbsp;wire features and features of&nbsp;its neighbours, we&nbsp;are getting quite clean picture with very few false positives. All we&nbsp;want is&nbsp;to&nbsp;detect and remove possible isolated ’islands’ with misidentified background hits. <br/>
</p>
<p>
 This is&nbsp;done by&nbsp;very approximate reconstruction of&nbsp;track centers. Since we&nbsp;know approximate radius of&nbsp;track center, we&nbsp;can use the Hough transform with fixed radius. It&nbsp;looks like:
</p>
<table class="image-wrapper">
	<tbody>
		<tr>
			<td><img src="/images/COMEThoughradius.png" /></td>
		</tr>
		<tr>
			<td>Visualization of&nbsp;Hough transform for circles with fixed radius. We&nbsp;are trying to&nbsp;reconstruct the center of&nbsp;track, going through red points. Assuming that we&nbsp;know the radius of&nbsp;fitted track, all possible centers are laying on&nbsp;the circle with center in&nbsp;red point. </td>
		</tr>
	</tbody>
</table>

<p>
 We&nbsp;discretize the space of&nbsp;possible track centers and for each point we&nbsp;reconstruct how likely it&nbsp;is&nbsp;the center of&nbsp;track. It&nbsp;is&nbsp;done using sparse matrices and some normalization + regularization (because otherwise tracks with few or&nbsp;many points will have very low/high probabilities).
</p>
<p>

 Once we&nbsp;computed Hough transform, we&nbsp;leave only those centers, where hough transform is&nbsp;high, applying some nonlinear transformation there and applying inverse hough + some filtering. This way we&nbsp;obtain for each wire the probability that it&nbsp;belongs to&nbsp;some signal track.
</p>
<p>
 Finally, we&nbsp;collect all the information we&nbsp;get for each wire:
</p>
<ul>
	<li>local features (energy deposition, timing, layer_id)</li>
	<li>features collected from neighbors</li>
	<li>result of&nbsp;inverse hough transform</li>
</ul>
<p>And train GBDT on&nbsp;these features to&nbsp;obtain final classifier. It’s ROC&nbsp;AUC&nbsp;is 0.9993 (100 times less probability of&nbsp;misordering) </p>
<table class="image-wrapper">
	<tbody>
		<tr>
			<td><img src="/images/COMETfinalgbdt.png" width="400" height="356"/></td>
		</tr>
		<tr>
			<td>Final classifier is&nbsp;red and it&nbsp;is&nbsp;very close to&nbsp;ideal one. The ROC&nbsp;AUC&nbsp;is about 0.9993 </td>
		</tr>
	</tbody>
</table>

<p>When we&nbsp;are comparing ROC curves at&nbsp;the threshold of&nbsp;interest (with very high signal sensitivity), things are bit worse, but still very impressing: </p>
<table class="image-wrapper">
	<tbody>
		<tr>
			<td><img src="/images/COMETfinalroccurves.png" width="400" height="305"/></td>
		</tr>
		<tr>
			<td>At&nbsp;stable benchmark, the background yield decreased by&nbsp;factor of&nbsp;34. Original ROC curve is&nbsp;not seen on&nbsp;the plot, since it&nbsp;is&nbsp;much lower. </td>
		</tr>
	</tbody>
</table>

<h3>Visualization of&nbsp;all steps</h3>
<table class="image-wrapper">
	<tbody>
		<tr>
			<td><img src="/images/COMETtracks1.png" width="400" height="396"/></td>
		</tr>
		<tr>
			<td>Initial picture of&nbsp;hits in&nbsp;CyDET. Red are signal tracks, blue are background ones. </td>
		</tr>
	</tbody>
</table>
<div><br/>
</div>
<table class="image-wrapper">
	<tbody>
		<tr>
			<td><img src="/images/COMETtracks2.png" width="400" height="396"/></td>
		</tr>
		<tr>
			<td>After we&nbsp;apply initial GBDT (which uses wire features + neighbors), we&nbsp;have some bck hits (to&nbsp;the right, for instance). See the ’island’ of&nbsp;misidentified background to&nbsp;the right.<br/>
 By&nbsp;greed dots we&nbsp;denote the possible centers of&nbsp;tracks. The bigger the point, the greater value of&nbsp;Hough transform image.<br/>
				<br/>
			</td>
		</tr>
	</tbody>
</table>
<table class="image-wrapper">
	<tbody>
		<tr>
			<td><img src="/images/COMETtracks3.png" width="400" height="396"/></td>
		</tr>
		<tr>
			<td>Now we&nbsp;apply some non-linear transformations to&nbsp;leave only centers with very high probability. Then applying inverse hough transform and apply second GBDT, which incorporates also information from inverse Hough transform. </td>
		</tr>
	</tbody>
</table>

<h3>Conclusion</h3>

<p>
I’ve described how simple machine learning techniques coupled with well-known algorithms can produce very good results,
superior to&nbsp;many complex approaches.
</p>


<h3>Links</h3>

<ol>
	<li><a href="https://github.com/e-gillies-ix/track-finding-yandex">Repository</a> with algorithms and results of&nbsp;tracking.<br/>
 Most of&nbsp;plots are taken from&nbsp;it, thanks to&nbsp;Ewen Gillies.
	</li>
	<li><a href="http://github.com/yandex/REP">Reproducible experiment platform</a> used in&nbsp;experiments, gradient boosting was used from <a href="https://github.com/scikit-learn/scikit-learn">scikit-learn</a></li>
	<li><a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.382.1310&amp;rep=rep1&amp;type=pdf">Review of&nbsp;straw chambers</a>. Straw chambers are used within many different experiments due to&nbsp;their high resolution and cheapness. </li>
</ol>